* Corresponding author: marcantoniomatteo@gmail.com
Rao’s Q Index, Time-Weighted Dynamic Time Warping, Landscape Heterogeneity, Remote Sensing, Time Series, Phenology
During the 2024 B-Cubed Hackathon, we extended the R package
“rasterdiv” by incorporating Time-Weighted Dynamic Time
Warping (TWDTW) to the package’s pre-existing paRao()
function for the calculation of parametric Rao’s Quadratic Diversity
(Rao’s Q) index. The implementation within “rasterdiv” uses
the “twtwd” function from the TWDTW R package.
This enhances the user’s ability to accurately assess biodiversity when
using contemporary remote sensing tools like satellite images. As some
biodiversity indices (e.g. Shannon’s H) do not account for
spatio-temporal dynamics at all, and others (e.g. Rao’s Q) only include
the spatial dimension, significant variations in phenology are often
overlooked.
Through integrating TWDTW into the paRao() function,
users can better assess an ecosystem’s biodiversity by accounting for
phenological differences among its constituent flora. This is
particularly valuable for distinguishing between natural habitats and
artificial land cover types, which can lack phenological changes.
Previous studies have also found that the time weighting ability of
TWDTW enables the discernment of different floral community types which
could otherwise be misclassified as the same by traditional Dynamic Time
Warping (DTW) approaches.
To evaluate the efficacy of TWDTW within the paRao()
function, we compared the ability of TWDTW Rao’s Q index with other
biodiversity indices at classifying the different floral communities in
a disturbed grassland in Calabria, Italy. Our study used a time series
of Planet Phenological Index (PPI) data from the Sentinel-2 satellite
network. The results indicated that accounting for phenological cycles
can filter out artefacts and better classify floral communities. This
improves the ability to assess ecosystem health and resilience,
providing a more comprehensive understanding of biodiversity
dynamics.
We conclude that the inclusion of phenology in biodiversity
assessment is necessary, and that our modifications of
paRao() will be valuable to facilitate the accurate
detection and description of ecosystem trends in response to our
changing environment.
The B-Cubed Hackathon brought together computer scientists and
ecologists from a variety of institutions to rapidly create novel
informatics solutions to the biodiversity challenges facing the planet.
We identified that the addition of time-weighting to the R package
“rasterdiv” would be a worthwhile contribution to the
environmental informatics community. rasterdiv was created
to calculate biodiversity indices from numerical matrices in the R
programming environment.
Biodiversity indices included in rasterdiv currently
focus on the spatial component. Here we outline how our extension to the
pre-existing implementation of Rao’s Q diversity indices (Rocchini, Marcantonio, and Ricotta 2017) can
explicitly account for the temporal dimension of data, alongside the
relevant biological context to our extension.
Ecosystems with heterogeneous biota have been shown both experimentally and theoretically to provide greater utility to all the agents which comprise that ecosystem (Zhang et al. 2022). This is through the provision of more resources and a greater variety of niches available for species. This subsequently increases the value of ecosystem services provided to the people in communities surrounding an ecosystem. Heterogeneous ecosystems are typically also more resilient to disturbances they experience, likely because they have functional redundancy (Mace, Norris, and Fitter 2012). Due to the centrality of biodiversity to healthy ecosystem functioning, quantitative measures of biodiversity are required to understand how ecosystems are responding to ongoing environmental changes, such as shifting land use patterns and climate change.
Novel satellite remote sensing tools generate large amounts of data about the Earth’s surface at increasing rate, due to their almost constant temporal coverage and increasing spatial resolution (Frazier and Hemingway 2021). The “big data” generated by these systems need to be interpreted effectively to accurately detect and describe ecosystem trends, such as a change in ecosystem diversity. Consequently, information theory has been used to build the analytical tools which are now regularly used to assess remote sensing datasets. For example, Shannon’s H index, and other related indices only based on mathematical transformation of frequency data, have been widely used as a proxy for biodiversity. But these can be inadequate when applied to time series data generated by remote sensing platforms. In fact, these indices do not consider the distance between each sampled point (whether they are species incidences, pixels, or any other quantitative abstractions of an observation). This approach therefore treats all objects within a dataset as equidistant from one another, so measures of Shannon’s H value are prone to saturation, especially when only marginal differences are observed between the objects in a remote sensing dataset.
Rao’s Quadratic Diversity index (Rao’s Q) adds “space” (but not
necessarily geographical space) as a trait to its abstraction of
biodiversity by accounting for the distance between traits of the
observations within a study site. As a trait informed alternative to
Shannon’s H, Rao’s Q has been demonstrated experimentally to offer
greater efficacy when representing biodiversity in aerial remote sensing
datasets (Rocchini et al. 2021), for which
pixels are the discrete observation unit. However, Rao’s Q remains
limited by its inability to assess trait change over time. Current
implementations of the index (for example in the rasterdiv
R package (Rocchini et al. 2021)) only
assess one snapshot of the data at a time. We set out to overcome this
limitation by incorporating Time-Weighted Dynamic Time Warping (TWDTW)
to include time as a component of the distance variable within Rao’s
Q.
Dynamic Time Warping (DTW) is a mathematical approach used to compare data series when the timing of observations differs. It has been used in a variety of disciplines. DTW works by finding the smallest distance between two time series.
However, by minimising the differences in timing, biologically significant differences can also be obscured, such as when comparing plant phenology. For instance, many tree species require a minimum number of Growing Degree Hours (GDH) to commence their springtime budburst (Fu et al. 2019). Many other ecosystem processes, such as seasonal migrations and pollination, need to coincide with phenological events, so phenology timing represents an important differentiating factor for time series representing ecosystems with plants (Fitchett, Grab, and Thompson 2015).
The TWDTW approach rectifies this by imposing a “cost” when aligning observations (which in our study are pixels) with greater temporal separation. Therefore, the TWDTW function is less likely to match a time series to others which exhibit substantially different phenologies. This has been successfully demonstrated by Maus et al (V. Maus et al. 2016) to classify changing land use patterns in the Brazilian Amazon, and was a more effective tool than standard DTW when applied to heterogeneous biological environments like these.
In addition to the standard cost matrix of the DTW function, they also propose a method to implement a cost due to temporal separation of temporal trends (Equation 1). In Equation 1, \(\alpha\) is the steepness of the logistic function which is used to impose a distance penalty on temporal separation, and \(\beta\) is the midpoint of the curve, the threshold below which separation in time does not substantially impact the other components of the equation. Lastly, \(g(t_i,t_j)\) represents the time elapsed between the dates evaluated in the match (\(t_i\) and \(t_j\) times of the \(i\)th and \(j\)th observations respectively).
\[ \omega_{i,j} = \frac{1}{1 + e^{-\alpha(g(t_i,t_j) - \beta)}} \tag{1} \]
In this manuscript, we used Copernicus Sentinel-2 mission’s optical
data of a small, grazed grassland site in Calabria, Italy to demonstrate
and evaluate our R-based rasterdiv implementation of
phenology into Rao’s Q index. We also evaluated its efficacy in
comparison to the Shannon’s H and unmodified Rao’s Q indices.
rasterdiv:We implemented this method within the existing paRao()
function of the rasterdiv R package (Marcantonio, Thouverau, and Rocchini 2024). We
used the twtwd function from the twdtw R
package (Victor Maus 2023) which is a
wrapper for C++ implementations of TWDTW functions. In particular, the
twtwd method was implemented through the internal hidden
function .mtwdtw:
.mtwdtw <- function(x, time_vector = 0, stepness = -0.5, midpoint = 35, cycle_length = "year", time_scale = "day") {
twdtw::twdtw(
x = data.frame(time = time_vector, v = x[[1]]),
y = data.frame(time = time_vector, v = x[[2]]),
time_weight = c(stepness = stepness, midpoint = midpoint),
cycle_length = cycle_length,
time_scale = time_scale)
}
stored in the rasterdiv file mdist.R. The
paRao() function uses the "distm_m" argument
to internally call this function, applying it to pairwise pixel time
series (Z). Within each moving window, Rao’s Q index is
calculated using the computed distances and the specified
alpha value. By leveraging the TWDTW distance over a
time-series of matrices or images, Rao’s Q index can be calculated using
the following R function:
paRao(x = time.series,
time_vector = time,
window = 3,
alpha = 2,
na.tolerance = 0,
simplify = 3,
method = "multidimension",
dist_m = "twdtw",
midpoint = 35,
stepness = -0.5
)
The arguments and our input parameters of which are:
time.series is an (X, Y, Z) raster stack
(or “cube”) of spectral data, where the X and Y axes represent discrete
pixel values, and each layer of the Z axis is a a different temporal
snapshot of the raster layer. In our study, this is the Sentinel-2
derived time series (with daily Z temporal resolution) of
our study site.
time_vector A vector of dates corresponding to every
Z layer in the raster time series, which must be the same
as the Z axis from the x variable. All pixels
in the input time series must share the same temporal spacing as the
temporal pattern to which it is being compared (i.e. if the time series
has observations on days c(1, 3, 7, ...), then the pattern
it is being compared to must also have observations on days
c(1, 3, 7, ...). If any x, y
pixel i along the Z temporal vector has any
invalid numeric data (e.g., NA, NULL,
Inf, etc.), its Rao’s Q index value will be
NA, since DTW does not allow for no data values.
steepness A continuous numeric value corresponding to
the \(\alpha\) variable from the
time-weighting function in Maus et al (V. Maus et al. 2016). Different values of \(\alpha\) can increase or decrease
penalisation for deviations from the pattern time.
midpoint A numeric value corresponding to the \(\beta\) variable from the time-weighting
function in Maus et al (V. Maus et al.
2016). The input data must be of the unit specified by the
time_scale argument (e.g. in our example, it is expressed
in days).
cycle_length This argument indicates the length of a
single cycle. This argument can accept either a string or numeric value.
Valid string input values are “year”, “month”, “day”, “hour”, “minute”,
and “second”. String inputs are also passed to the
time_scale argument. Numeric input values must be the same
unit specified by the time_scale argument.
time_scale This argument sets the units of the TWDTW
function’s cycle length. Valid string input values are “year”, “month”,
“day”, “hour”, “minute”, and “second”. If the input given to
cycle_length is a string value, then this argument will
change the units given in the output. Alternatively, if a numeric value
is given to cycle_length, then this argument will compute
the elapsed time in seconds.
Other arguments remain unchanged from (Victor Maus 2023).
The study site was a small (6 hectare) area within the protected area of “Macchia Sacra” (Site of Community Importance (SCI): SICIT9310073). It was selected as it was suitable for thorough imaging by drone, which formed the basis of our ground-truthed biodiversity observations. After drone image acquisition, we defined eight types of plant communities within the study site, with the expertise in classification imparted by an expert botanist (Figure 1). The study site is characterized by the presence of a road on the north-east part of the site. From the level of the road the elevation declines to a lower part which features a sharp canyon running south to west, the result of a previous small stream which had dried up by the time of our drone survey. This part of the study site is characterized by hydrophilic vegetation. Between these two extremes is a small hill which culminates in a plateau. The plateau is the resting area of a herd of cows which graze in the area. This area is much dryer and subject to strong pasture pressure and mechanical disruption, but is more nutrient which, due to the presence of cow manure.
We used a time series of 144 Plant Phenological Index (PPI) images derived from Sentinel-2 imagery and available through the Copernicus Service High Resolution Vegetation Phenology and Productivity (HRVPP) (data acquired via: https://scihub.copernicus.eu/) encompassing all data which were available in 2023 (Figure 2). The dimensions of each image were 20 pixels vertically by 27 pixels horizontally, and each pixel was 10m2. The PPI index was chosen as it is minimally influenced by soil signal and the presence of shadows (Karkauskaite, Tagesson, and Fensholt 2017).